ARRAU: Linguistically-Motivated Annotation of Anaphoric Descriptions
نویسندگان
چکیده
This paper presents a second release of the ARRAU dataset: a multi-domain corpus with thorough linguistically motivated annotation of anaphora and related phenomena. Building upon the first release almost a decade ago, a considerable effort had been invested in improving the data both quantitatively and qualitatively. Thus, we have doubled the corpus size, expanded the selection of covered phenomena to include referentiality and genericity and designed and implemented a methodology for enforcing the consistency of the manual annotation. We believe that the new release of ARRAU provides a valuable material for ongoing research in complex cases of coreference as well as for a variety of related tasks. The corpus is publicly available through LDC.
منابع مشابه
Anaphoric Annotation in the ARRAU Corpus
Arrau is a new corpus annotated for anaphoric relations, with information about agreement and explicit representation of multiple antecedents for ambiguous anaphoric expressions and discourse antecedents for expressions which refer to abstract entities such as events, actions and plans. The corpus contains texts from different genres: task-oriented dialogues from the Trains-91 and Trains-93 cor...
متن کاملProcessing definite descriptions in corpora
We discuss in this paper a system that resolves definite descriptions in written texts. A preliminary study of definite descriptions in a collection of 20 texts revealed that about 30% of the 1040 definites in the collection were cases of anaphoric definites whose antecedents had the same head noun, and 50% introduced novel discourse referents. An algorithm which resolves anaphoric definite des...
متن کاملPerformance and limitations of the linguistically motivated Cocoa/Peaberry system in a broad biological domain
We tested a linguistically motivated rulebased system in the Cancer Genetics task of the BioNLP13 shared task challenge. The performance of the system was very moderate, ranging from 52% against the development set to 45% against the test set. Interestingly, the performance of the system did not change appreciably when using only entities tagged by the inbuilt tagger as compared to performance ...
متن کاملTextual co-reference annotation: a study on definite descriptions
In the linguistic literature many different uses of definite descriptions are acknowledged and explained (Fraurud 1990, Hawkins 1978, Löbner 1985, Prince 1992). These authors give us taxonomies of the different uses of the definite article. Based on these previous works we ran two experiments in annotating definite description uses whose goals were: 1. to observe the distribution of the differe...
متن کاملOntology Learning and Semantic Annotation: a Necessary Symbiosis
Semantic annotation of text requires the dynamic merging of linguistically structured information and a “world model”, usually represented as a domain-specific ontology. On the other hand, the process of engineering a domain ontology through semi-automatic ontology learning system requires the availability of a considerable amount of semantically annotated documents. Facing this bootstrapping p...
متن کامل